gh-58038: `Unicode*Error`: update args tuple on call #139274

StanFromIreland · 2025-09-23T20:03:47Z

Issue: codecs error handler is called with a UnicodeDecodeError with the same args #58038

ZeroIntensity

I think it's very likely that people are relying on this at this point.

ZeroIntensity · 2025-09-24T12:57:46Z

Lib/test/test_exceptions.py

                 'start' : 0, 'reason' : 'ordinal not in range'}),
            (UnicodeDecodeError, ('ascii', bytearray(b'\xff'), 0, 1,
                                  'ordinal not in range'), {},
-                {'args' : ('ascii', bytearray(b'\xff'), 0, 1,


This is setting off some alarm bells for me. If we're changing tests like this, we're probably breaking something.

This seems like a bug fix, there is a mismatch:

>>> exc = UnicodeDecodeError('ascii', bytearray(b'\xff'), 0, 1,'ordinal not in range') >>> exc.args # 1 is the object ('ascii', bytearray(b'\xff'), 0, 1, 'ordinal not in range') >>> exc.object b'\xff' >>>

bedevere-app · 2025-09-24T12:58:21Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

picnixz

Don't change this. People might rely on this behavior and I didn't want to change this (see also #58038 (comment)).

In general .args contains the arguments that were passed at construction time. The same holds for other exceptions: https://docs.python.org/3/library/exceptions.html#BaseException.args.

StanFromIreland · 2025-09-24T14:39:35Z

Closed the issue.

vstinner · 2025-09-24T17:48:30Z

I think it's very likely that people are relying on this at this point.

Well, the number of 3rd party codecs should be pretty low, and this change is a backward incompatible change on purpose: it fixes an old bug.

picnixz · 2025-09-24T18:14:19Z

But it is not the documented behavior (or this should be re-documented for UnicodeError explicitly, or maybe it is documented elsewhere and I didn't find it (I am no more on my dev session)). Users should rely on the non-args value and instead access the named properties I think.

vstinner · 2025-09-25T10:23:08Z

Users should rely on the non-args value and instead access the named properties I think.

I agree with that. But it doesn't prevent to fix exc.args member, an old Python bug.

args is not documented explicitly in UnicodeError: https://docs.python.org/dev/library/exceptions.html#UnicodeError

It's documented on the BaseException: https://docs.python.org/dev/library/exceptions.html#BaseException.args

picnixz · 2025-09-25T11:09:46Z

Yes, but the fact that it says:

The tuple of arguments given to the exception constructor

makes me feel that we should not break this part of the contract. That is, args should contain the original arguments tuple, and it doesn't matter how we change the attributes.

OTOH, we could have maybe have an astuple() method that would return the args as one could expect. It'd be a new method though but changing the meaning of args is risky. Now, after re-reading the issue, I also spotted in the PEP the following:

Should further encoding errors occur, the encoder is allowed to reuse the exception object for the next call to the callback.

So, re-using the same exception object should be allowed. Thus, I wouldn't consider the current behavior an "old" bug.

Note that a similar argument could be said for OSError. We can change the errno but args is not updated. Also OSError.args only contains the two first arguments for backward compatibility (it's documented as such). I would rather recall the fact that re-using UnicodeError exceptions does not update args.

StanFromIreland added 3 commits September 23, 2025 20:23

Commit

9cecc71

Commit

ba84635

Clean up

040ed5c

StanFromIreland requested a review from vstinner September 23, 2025 20:03

StanFromIreland requested a review from iritkatriel as a code owner September 23, 2025 20:03

bedevere-app bot added the awaiting review label Sep 23, 2025

bedevere-app bot mentioned this pull request Sep 23, 2025

codecs error handler is called with a UnicodeDecodeError with the same args #58038

Closed

ZeroIntensity requested changes Sep 24, 2025

View reviewed changes

bedevere-app bot added awaiting changes and removed awaiting review labels Sep 24, 2025

picnixz requested changes Sep 24, 2025

View reviewed changes

StanFromIreland closed this Sep 24, 2025

StanFromIreland deleted the unicode-errors branch September 24, 2025 14:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-58038: `Unicode*Error`: update args tuple on call #139274

gh-58038: `Unicode*Error`: update args tuple on call #139274

Uh oh!

StanFromIreland commented Sep 23, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

ZeroIntensity left a comment

Uh oh!

ZeroIntensity Sep 24, 2025

Uh oh!

StanFromIreland Sep 24, 2025

Uh oh!

bedevere-app bot commented Sep 24, 2025

Uh oh!

picnixz left a comment

Uh oh!

StanFromIreland commented Sep 24, 2025

Uh oh!

vstinner commented Sep 24, 2025

Uh oh!

picnixz commented Sep 24, 2025

Uh oh!

vstinner commented Sep 25, 2025

Uh oh!

picnixz commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

gh-58038: Unicode*Error: update args tuple on call #139274

gh-58038: Unicode*Error: update args tuple on call #139274

Uh oh!

Conversation

StanFromIreland commented Sep 23, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZeroIntensity left a comment

Choose a reason for hiding this comment

Uh oh!

ZeroIntensity Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

StanFromIreland Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

bedevere-app bot commented Sep 24, 2025

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

StanFromIreland commented Sep 24, 2025

Uh oh!

vstinner commented Sep 24, 2025

Uh oh!

picnixz commented Sep 24, 2025

Uh oh!

vstinner commented Sep 25, 2025

Uh oh!

picnixz commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gh-58038: `Unicode*Error`: update args tuple on call #139274

gh-58038: `Unicode*Error`: update args tuple on call #139274

StanFromIreland commented Sep 23, 2025 •

edited by bedevere-app bot

Loading